a. Understand model accuracy. Why is it a performance metric for classification and not regression?
b. Calculate accuracy for a simple majority class model (this is the same as calculating the proportion of the majority class in a binary variable). Consider: x <- c(1, 1, 1, 0, 0). What is the majority class? What is the proportion of the majority class in x?
c. Fit a tree model of the target with just one predictor variable and calculate the accuracy of this model.
d. Interpret a tree model, and calculate information gain.
e. Fit a tree model of the target using all the predictors, then: create a visualization of the tree and identify the top 3 most important predictors in this model.
f. How do these models compare to majority class prediction?
g. How will you use a classification model as part of a solution to the AdviseInvest case?
import pandas as pd
import matplotlib as mpl
import numpy as np
from sklearn.tree import DecisionTreeClassifier, export_graphviz # Import Decision Tree Classifier
from sklearn.model_selection import train_test_split # Import train_test_split function
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation
from sklearn import tree
In this case we will load data from the statsmodels.org library
See the Canvas assignments and lectures for a description of the Megatelco_leave_survey.csv data
Note: you will need to enter a code supplied by Google in the next step.
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
df = pd.read_csv (r'/content/gdrive/MyDrive/IS-4487/megatelco_leave_survey (1).csv')
Mounted at /content/gdrive
#look at the top rows
df.head(10)
| college | income | overage | leftover | house | handset_price | over_15mins_calls_per_month | average_call_duration | reported_satisfaction | reported_usage_level | considering_change_of_plan | leave | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | one | 23859 | 70 | 0 | 519105 | 154 | 5.0 | 8 | low | low | yes | LEAVE | 8183 |
| 1 | zero | 72466 | 67 | 16 | 271182 | 262 | 5.0 | 5 | low | low | yes | LEAVE | 12501 |
| 2 | zero | 30883 | 60 | 0 | 647281 | 211 | 3.0 | 8 | low | low | yes | STAY | 7425 |
| 3 | one | 44512 | 0 | 22 | 754958 | 232 | 0.0 | 5 | low | low | no | LEAVE | 13488 |
| 4 | zero | 70535 | 0 | 0 | 653421 | 310 | 0.0 | 14 | low | low | yes | STAY | 11389 |
| 5 | zero | 143987 | 0 | 56 | 896544 | 778 | 5.0 | 1 | low | high | yes | STAY | 14674 |
| 6 | one | 96668 | 79 | 24 | 259329 | 365 | 5.0 | 6 | low | avg | yes | LEAVE | 19100 |
| 7 | one | 50083 | 0 | 0 | 160335 | 266 | 5.0 | 10 | low | high | yes | STAY | 18170 |
| 8 | one | 104392 | 0 | 0 | 247836 | 778 | 1.0 | 8 | high | low | no | LEAVE | 3201 |
| 9 | one | 37852 | 0 | 74 | 264893 | 857 | 0.0 | 2 | low | low | yes | LEAVE | 12612 |
#look at the datatypes
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5000 entries, 0 to 4999 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 college 5000 non-null object 1 income 5000 non-null int64 2 overage 5000 non-null int64 3 leftover 5000 non-null int64 4 house 5000 non-null int64 5 handset_price 5000 non-null int64 6 over_15mins_calls_per_month 4997 non-null float64 7 average_call_duration 5000 non-null int64 8 reported_satisfaction 5000 non-null object 9 reported_usage_level 5000 non-null object 10 considering_change_of_plan 5000 non-null object 11 leave 5000 non-null object 12 id 5000 non-null int64 dtypes: float64(1), int64(7), object(5) memory usage: 507.9+ KB
#describe the data before cleaning it
df.describe()
| income | overage | leftover | house | handset_price | over_15mins_calls_per_month | average_call_duration | id | |
|---|---|---|---|---|---|---|---|---|
| count | 5000.000000 | 5000.000000 | 5000.000000 | 5000.000000 | 5.000000e+03 | 4997.000000 | 5000.000000 | 5000.000000 |
| mean | 79912.948400 | 85.119200 | 24.243600 | 493656.326600 | 7.876236e+02 | 7.744647 | 5.958800 | 10036.639400 |
| std | 41703.042384 | 85.655622 | 26.847496 | 254287.193865 | 2.828291e+04 | 8.806070 | 4.390417 | 5813.620304 |
| min | -28811.000000 | 0.000000 | 0.000000 | -796132.000000 | 1.300000e+02 | 0.000000 | 1.000000 | 2.000000 |
| 25% | 41592.500000 | 0.000000 | 0.000000 | 260586.500000 | 2.190000e+02 | 1.000000 | 2.000000 | 4950.500000 |
| 50% | 75041.500000 | 59.000000 | 15.000000 | 451865.500000 | 3.220000e+02 | 4.000000 | 5.000000 | 10126.000000 |
| 75% | 115475.000000 | 177.000000 | 42.000000 | 701608.750000 | 5.280000e+02 | 14.000000 | 9.000000 | 15085.250000 |
| max | 159938.000000 | 335.000000 | 89.000000 | 1000000.000000 | 2.000234e+06 | 29.000000 | 15.000000 | 20000.000000 |
Did you notice anything unusual about the "house" amounts?
How about the handset price and income?
Clean up the data in a new datafram named "df_clean"
#delete rows with outlier data; put it in a new dataframe
df_clean = df[(df['house'] > 0) & (df['income'] > 0) & (df['handset_price'] < 1000)]
#delete any rows with missing values in the clean dataframe
df_clean = df_clean.dropna()
df_clean.describe()
| income | overage | leftover | house | handset_price | over_15mins_calls_per_month | average_call_duration | id | |
|---|---|---|---|---|---|---|---|---|
| count | 4994.000000 | 4994.000000 | 4994.000000 | 4994.000000 | 4994.000000 | 4994.000000 | 4994.000000 | 4994.000000 |
| mean | 79911.270525 | 85.114738 | 24.244694 | 493946.252903 | 387.616340 | 7.739287 | 5.957549 | 10032.925110 |
| std | 41683.689543 | 85.610045 | 26.844259 | 253599.007645 | 213.659555 | 8.802897 | 4.389439 | 5815.013219 |
| min | 20028.000000 | 0.000000 | 0.000000 | 150305.000000 | 130.000000 | 0.000000 | 1.000000 | 2.000000 |
| 25% | 41591.500000 | 0.000000 | 0.000000 | 260741.500000 | 219.000000 | 1.000000 | 2.000000 | 4943.000000 |
| 50% | 74962.500000 | 59.000000 | 15.000000 | 452087.500000 | 322.000000 | 4.000000 | 5.000000 | 10124.000000 |
| 75% | 115497.000000 | 177.000000 | 42.000000 | 701612.250000 | 528.000000 | 14.000000 | 9.000000 | 15082.750000 |
| max | 159938.000000 | 335.000000 | 89.000000 | 1000000.000000 | 899.000000 | 29.000000 | 15.000000 | 20000.000000 |
#Get distinct values
df_clean['college'].unique()
array(['one', 'zero'], dtype=object)
df_clean['reported_satisfaction'] .unique()
array(['low', 'high', 'avg'], dtype=object)
df_clean['reported_usage_level'].unique()
array(['low', 'high', 'avg'], dtype=object)
df_clean['considering_change_of_plan'].unique()
array(['yes', 'no', 'maybe'], dtype=object)
df_clean.loc[df_clean['college'] == 'one', 'college'] = "1"
df_clean.loc[df_clean['college'] == 'zero', 'college'] = "0"
df_clean.loc[df_clean['reported_satisfaction'] == 'low', 'reported_satisfaction'] = "1"
df_clean.loc[df_clean['reported_satisfaction'] == 'avg', 'reported_satisfaction'] = "2"
df_clean.loc[df_clean['reported_satisfaction'] == 'high', 'reported_satisfaction'] = "3"
df_clean.loc[df_clean['reported_usage_level'] == 'low', 'reported_usage_level'] = "1"
df_clean.loc[df_clean['reported_usage_level'] == 'avg', 'reported_usage_level'] = "2"
df_clean.loc[df_clean['reported_usage_level'] == 'high', 'reported_usage_level'] = "3"
df_clean.loc[df_clean['considering_change_of_plan'] == 'yes', 'considering_change_of_plan'] = "1"
df_clean.loc[df_clean['considering_change_of_plan'] == 'no', 'considering_change_of_plan'] = "0"
df_clean.loc[df_clean['considering_change_of_plan'] == 'maybe', 'considering_change_of_plan'] = "0.5"
df_clean['college'] = df_clean['college'].astype('int')
df_clean['reported_satisfaction'] = df_clean['reported_satisfaction'].astype('int')
df_clean['reported_usage_level'] = df_clean['reported_usage_level'].astype('int')
df_clean['considering_change_of_plan'] = df_clean['considering_change_of_plan'].astype('float')
df_clean.head(10)
| college | income | overage | leftover | house | handset_price | over_15mins_calls_per_month | average_call_duration | reported_satisfaction | reported_usage_level | considering_change_of_plan | leave | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 23859 | 70 | 0 | 519105 | 154 | 5.0 | 8 | 1 | 1 | 1.0 | LEAVE | 8183 |
| 1 | 0 | 72466 | 67 | 16 | 271182 | 262 | 5.0 | 5 | 1 | 1 | 1.0 | LEAVE | 12501 |
| 2 | 0 | 30883 | 60 | 0 | 647281 | 211 | 3.0 | 8 | 1 | 1 | 1.0 | STAY | 7425 |
| 3 | 1 | 44512 | 0 | 22 | 754958 | 232 | 0.0 | 5 | 1 | 1 | 0.0 | LEAVE | 13488 |
| 4 | 0 | 70535 | 0 | 0 | 653421 | 310 | 0.0 | 14 | 1 | 1 | 1.0 | STAY | 11389 |
| 5 | 0 | 143987 | 0 | 56 | 896544 | 778 | 5.0 | 1 | 1 | 3 | 1.0 | STAY | 14674 |
| 6 | 1 | 96668 | 79 | 24 | 259329 | 365 | 5.0 | 6 | 1 | 2 | 1.0 | LEAVE | 19100 |
| 7 | 1 | 50083 | 0 | 0 | 160335 | 266 | 5.0 | 10 | 1 | 3 | 1.0 | STAY | 18170 |
| 8 | 1 | 104392 | 0 | 0 | 247836 | 778 | 1.0 | 8 | 3 | 1 | 0.0 | LEAVE | 3201 |
| 9 | 1 | 37852 | 0 | 74 | 264893 | 857 | 0.0 | 2 | 1 | 1 | 1.0 | LEAVE | 12612 |
#Method #1
#df_clean['leave'] = pd.Categorical(df_clean['leave'])
#Method #2
df_clean['leave'] = df_clean['leave'].astype('category')
df_clean['college'] = df_clean['college'].astype('category')
df_clean['reported_satisfaction'] = df_clean['reported_satisfaction'].astype('category')
df_clean['reported_usage_level'] = df_clean['reported_usage_level'].astype('category')
df_clean['considering_change_of_plan'] = df_clean['considering_change_of_plan'].astype('category')
df_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 4994 entries, 0 to 4999 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 college 4994 non-null category 1 income 4994 non-null int64 2 overage 4994 non-null int64 3 leftover 4994 non-null int64 4 house 4994 non-null int64 5 handset_price 4994 non-null int64 6 over_15mins_calls_per_month 4994 non-null float64 7 average_call_duration 4994 non-null int64 8 reported_satisfaction 4994 non-null category 9 reported_usage_level 4994 non-null category 10 considering_change_of_plan 4994 non-null category 11 leave 4994 non-null category 12 id 4994 non-null int64 dtypes: category(5), float64(1), int64(7) memory usage: 376.2 KB
What is the proportion of people who churned?
Why should we care about this proportion?
An important step in EDA is to understand the distribution of the target variable.
The majority class in the target variable will serve as an important benchmark for model performance. If we used what we'll call a "majority class classifier"---this consists in always predicting the majority class, which in this case is STAY---we would be correct 1 - .49 or 51% of the time. Another way of saying this is that majority class classifier in the MegaTelCo case would result in accuracy of .51.
Accuracy is defined as the proportion of correctly predicted labels. It is a commonly used error metric for evaluating classifier performance.
Think about why a majority class model in this case would have an accuracy of .51.
Whatever later model we develop should have better accuracy than this performance benchmark.
#Add new field with binary value for leave
df_clean['leave_flag'] = df_clean['leave'].str.replace('STAY','0')
df_clean['leave_flag'] = df_clean['leave_flag'].str.replace('LEAVE','1')
#Convert to integer
df_clean['leave_flag'] = df_clean['leave_flag'].astype('int')
#Find the mean value
df_clean['leave_flag'].mean()
0.4941930316379656
Use just two variables, 'income' and 'house'. We'll call this the "money tree."
What is the accuracy of the money tree?
# split the datafram into independent (x) and dependent (predicted) attributes (y)
x = df_clean[['income','house']]
y = df_clean['leave']
money_tree = DecisionTreeClassifier()
# Create Decision Tree Classifer
money_tree = money_tree.fit(x,y)
money_tree_text = tree.export_text(money_tree)
print(money_tree_text)
|--- feature_1 <= 600255.50 | |--- feature_1 <= 319103.00 | | |--- feature_1 <= 316869.50 | | | |--- feature_0 <= 147861.50 | | | | |--- feature_0 <= 146324.00 | | | | | |--- feature_1 <= 247683.00 | | | | | | |--- feature_1 <= 244840.50 | | | | | | | |--- feature_0 <= 22184.00 | | | | | | | | |--- feature_1 <= 231854.00 | | | | | | | | | |--- feature_0 <= 20394.00 | | | | | | | | | | |--- feature_0 <= 20114.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_0 > 20114.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 20394.00 | | | | | | | | | | |--- feature_1 <= 225750.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- feature_1 > 225750.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- feature_1 > 231854.00 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 22184.00 | | | | | | | | |--- feature_0 <= 143971.00 | | | | | | | | | |--- feature_1 <= 219414.50 | | | | | | | | | | |--- feature_1 <= 218573.00 | | | | | | | | | | | |--- truncated branch of depth 42 | | | | | | | | | | |--- feature_1 > 218573.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- feature_1 > 219414.50 | | | | | | | | | | |--- feature_1 <= 220032.50 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_1 > 220032.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | |--- feature_0 > 143971.00 | | | | | | | | | |--- feature_1 <= 197969.50 | | | | | | | | | | |--- feature_0 <= 144159.50 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_0 > 144159.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- feature_1 > 197969.50 | | | | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 244840.50 | | | | | | | |--- feature_1 <= 245469.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 245469.00 | | | | | | | | |--- feature_1 <= 247525.00 | | | | | | | | | |--- feature_1 <= 247259.00 | | | | | | | | | | |--- feature_0 <= 60371.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- feature_0 > 60371.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- feature_1 > 247259.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 247525.00 | | | | | | | | | |--- class: STAY | | | | | |--- feature_1 > 247683.00 | | | | | | |--- feature_0 <= 128136.50 | | | | | | | |--- feature_0 <= 123247.00 | | | | | | | | |--- feature_1 <= 251438.50 | | | | | | | | | |--- feature_0 <= 33565.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 33565.50 | | | | | | | | | | |--- feature_0 <= 48059.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- feature_0 > 48059.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- feature_1 > 251438.50 | | | | | | | | | |--- feature_1 <= 251748.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 251748.50 | | | | | | | | | | |--- feature_1 <= 315850.50 | | | | | | | | | | | |--- truncated branch of depth 29 | | | | | | | | | | |--- feature_1 > 315850.50 | | | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 123247.00 | | | | | | | | |--- feature_0 <= 123876.50 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 123876.50 | | | | | | | | | |--- feature_0 <= 124221.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 124221.50 | | | | | | | | | | |--- feature_0 <= 125118.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_0 > 125118.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- feature_0 > 128136.50 | | | | | | | |--- feature_1 <= 288889.00 | | | | | | | | |--- feature_1 <= 286516.00 | | | | | | | | | |--- feature_1 <= 278456.00 | | | | | | | | | | |--- feature_1 <= 252179.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- feature_1 > 252179.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- feature_1 > 278456.00 | | | | | | | | | | |--- feature_1 <= 284786.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- feature_1 > 284786.50 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 286516.00 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 288889.00 | | | | | | | | |--- feature_0 <= 137710.50 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 137710.50 | | | | | | | | | |--- feature_0 <= 139456.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 139456.50 | | | | | | | | | | |--- feature_0 <= 141299.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- feature_0 > 141299.50 | | | | | | | | | | | |--- class: LEAVE | | | | |--- feature_0 > 146324.00 | | | | | |--- feature_1 <= 230355.00 | | | | | | |--- feature_0 <= 147051.00 | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 147051.00 | | | | | | | |--- feature_0 <= 147242.00 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 147242.00 | | | | | | | | |--- feature_1 <= 170109.00 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 170109.00 | | | | | | | | | |--- feature_1 <= 179452.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 179452.00 | | | | | | | | | | |--- feature_0 <= 147629.50 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_0 > 147629.50 | | | | | | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 230355.00 | | | | | | |--- class: STAY | | | |--- feature_0 > 147861.50 | | | | |--- feature_1 <= 257109.50 | | | | | |--- feature_1 <= 173633.50 | | | | | | |--- feature_1 <= 156571.50 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 156571.50 | | | | | | | |--- feature_1 <= 160159.50 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 160159.50 | | | | | | | | |--- feature_0 <= 153103.00 | | | | | | | | | |--- feature_1 <= 167756.00 | | | | | | | | | | |--- feature_0 <= 148330.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | | |--- feature_0 > 148330.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 167756.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 153103.00 | | | | | | | | | |--- feature_0 <= 157667.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 157667.00 | | | | | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 173633.50 | | | | | | |--- feature_0 <= 158635.00 | | | | | | | |--- feature_0 <= 158513.50 | | | | | | | | |--- feature_1 <= 199572.50 | | | | | | | | | |--- feature_0 <= 157650.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 157650.50 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 199572.50 | | | | | | | | | |--- feature_1 <= 204148.50 | | | | | | | | | | |--- feature_0 <= 151554.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- feature_0 > 151554.50 | | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 204148.50 | | | | | | | | | | |--- feature_0 <= 156564.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- feature_0 > 156564.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 158513.50 | | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 158635.00 | | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 257109.50 | | | | | |--- feature_1 <= 263053.00 | | | | | | |--- class: STAY | | | | | |--- feature_1 > 263053.00 | | | | | | |--- feature_1 <= 269738.50 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 269738.50 | | | | | | | |--- feature_1 <= 300326.00 | | | | | | | | |--- feature_1 <= 295900.00 | | | | | | | | | |--- feature_1 <= 293976.50 | | | | | | | | | | |--- feature_0 <= 150745.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | | |--- feature_0 > 150745.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- feature_1 > 293976.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 295900.00 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 300326.00 | | | | | | | | |--- feature_0 <= 149452.50 | | | | | | | | | |--- feature_0 <= 148224.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 148224.00 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 149452.50 | | | | | | | | | |--- feature_0 <= 157476.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 157476.00 | | | | | | | | | | |--- class: STAY | | |--- feature_1 > 316869.50 | | | |--- feature_0 <= 130833.50 | | | | |--- feature_1 <= 318148.00 | | | | | |--- class: STAY | | | | |--- feature_1 > 318148.00 | | | | | |--- feature_1 <= 318494.50 | | | | | | |--- feature_0 <= 94274.50 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 94274.50 | | | | | | | |--- class: STAY | | | | | |--- feature_1 > 318494.50 | | | | | | |--- class: STAY | | | |--- feature_0 > 130833.50 | | | | |--- feature_0 <= 136283.50 | | | | | |--- class: LEAVE | | | | |--- feature_0 > 136283.50 | | | | | |--- feature_1 <= 318498.50 | | | | | | |--- class: STAY | | | | | |--- feature_1 > 318498.50 | | | | | | |--- class: LEAVE | |--- feature_1 > 319103.00 | | |--- feature_0 <= 20181.00 | | | |--- class: STAY | | |--- feature_0 > 20181.00 | | | |--- feature_0 <= 21461.50 | | | | |--- feature_1 <= 569388.00 | | | | | |--- feature_0 <= 20312.00 | | | | | | |--- feature_0 <= 20255.50 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 20255.50 | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 20312.00 | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 569388.00 | | | | | |--- feature_0 <= 20402.00 | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 20402.00 | | | | | | |--- class: STAY | | | |--- feature_0 > 21461.50 | | | | |--- feature_0 <= 21608.50 | | | | | |--- class: STAY | | | | |--- feature_0 > 21608.50 | | | | | |--- feature_1 <= 326096.50 | | | | | | |--- feature_0 <= 89423.50 | | | | | | | |--- feature_1 <= 323587.00 | | | | | | | | |--- feature_0 <= 81315.50 | | | | | | | | | |--- feature_1 <= 321198.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 321198.50 | | | | | | | | | | |--- feature_0 <= 66030.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- feature_0 > 66030.50 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 81315.50 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 323587.00 | | | | | | | | |--- feature_1 <= 325306.50 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 325306.50 | | | | | | | | | |--- feature_0 <= 54329.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 54329.50 | | | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 89423.50 | | | | | | | |--- feature_1 <= 319821.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 319821.00 | | | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 326096.50 | | | | | | |--- feature_1 <= 334520.00 | | | | | | | |--- feature_0 <= 104510.00 | | | | | | | | |--- feature_0 <= 63982.00 | | | | | | | | | |--- feature_1 <= 327946.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 327946.00 | | | | | | | | | | |--- feature_0 <= 26653.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- feature_0 > 26653.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 63982.00 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 104510.00 | | | | | | | | |--- feature_1 <= 332032.50 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 332032.50 | | | | | | | | | |--- feature_0 <= 126997.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 126997.00 | | | | | | | | | | |--- feature_0 <= 145341.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_0 > 145341.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 334520.00 | | | | | | | |--- feature_1 <= 342372.00 | | | | | | | | |--- feature_1 <= 340083.50 | | | | | | | | | |--- feature_1 <= 340010.50 | | | | | | | | | | |--- feature_0 <= 146136.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- feature_0 > 146136.50 | | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 340010.50 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 340083.50 | | | | | | | | | |--- feature_0 <= 31832.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 31832.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 342372.00 | | | | | | | | |--- feature_1 <= 376510.50 | | | | | | | | | |--- feature_1 <= 356137.00 | | | | | | | | | | |--- feature_1 <= 342983.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_1 > 342983.00 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- feature_1 > 356137.00 | | | | | | | | | | |--- feature_1 <= 356527.50 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_1 > 356527.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- feature_1 > 376510.50 | | | | | | | | | |--- feature_1 <= 388111.00 | | | | | | | | | | |--- feature_1 <= 386645.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- feature_1 > 386645.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 388111.00 | | | | | | | | | | |--- feature_1 <= 389321.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- feature_1 > 389321.50 | | | | | | | | | | | |--- truncated branch of depth 37 |--- feature_1 > 600255.50 | |--- feature_0 <= 99826.00 | | |--- feature_1 <= 972711.50 | | | |--- feature_1 <= 924480.50 | | | | |--- feature_1 <= 817165.50 | | | | | |--- feature_1 <= 816584.00 | | | | | | |--- feature_0 <= 99373.00 | | | | | | | |--- feature_0 <= 21765.50 | | | | | | | | |--- feature_1 <= 650539.00 | | | | | | | | | |--- feature_1 <= 639671.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 639671.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 650539.00 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 21765.50 | | | | | | | | |--- feature_0 <= 21932.00 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 21932.00 | | | | | | | | | |--- feature_0 <= 22419.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 22419.50 | | | | | | | | | | |--- feature_0 <= 23479.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- feature_0 > 23479.00 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | |--- feature_0 > 99373.00 | | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 816584.00 | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 817165.50 | | | | | |--- feature_0 <= 20781.50 | | | | | | |--- feature_1 <= 857573.50 | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 857573.50 | | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 20781.50 | | | | | | |--- feature_0 <= 37637.00 | | | | | | | |--- feature_1 <= 920305.50 | | | | | | | | |--- feature_0 <= 30993.50 | | | | | | | | | |--- feature_0 <= 30631.50 | | | | | | | | | | |--- feature_1 <= 886968.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- feature_1 > 886968.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 30631.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 30993.50 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 920305.50 | | | | | | | | |--- feature_1 <= 921775.00 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 921775.00 | | | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 37637.00 | | | | | | | |--- feature_0 <= 38210.00 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 38210.00 | | | | | | | | |--- feature_1 <= 846899.50 | | | | | | | | | |--- feature_0 <= 93635.50 | | | | | | | | | | |--- feature_1 <= 832932.50 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_1 > 832932.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- feature_0 > 93635.50 | | | | | | | | | | |--- feature_0 <= 95378.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | | |--- feature_0 > 95378.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 846899.50 | | | | | | | | | |--- feature_1 <= 847843.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 847843.00 | | | | | | | | | | |--- feature_0 <= 69657.00 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- feature_0 > 69657.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | |--- feature_1 > 924480.50 | | | | |--- feature_1 <= 934710.50 | | | | | |--- feature_1 <= 925641.50 | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 925641.50 | | | | | | |--- feature_0 <= 80700.50 | | | | | | | |--- feature_0 <= 57594.50 | | | | | | | | |--- feature_0 <= 52208.00 | | | | | | | | | |--- feature_1 <= 927511.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 927511.50 | | | | | | | | | | |--- feature_0 <= 35909.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- feature_0 > 35909.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 52208.00 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 57594.50 | | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 80700.50 | | | | | | | |--- feature_1 <= 926780.50 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 926780.50 | | | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 934710.50 | | | | | |--- feature_1 <= 971726.00 | | | | | | |--- feature_0 <= 44378.50 | | | | | | | |--- feature_1 <= 946812.50 | | | | | | | | |--- feature_0 <= 23339.50 | | | | | | | | | |--- feature_0 <= 22806.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 22806.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 23339.50 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 946812.50 | | | | | | | | |--- feature_1 <= 953457.00 | | | | | | | | | |--- feature_0 <= 34595.50 | | | | | | | | | | |--- feature_0 <= 23758.50 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | | |--- feature_0 > 23758.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- feature_0 > 34595.50 | | | | | | | | | | |--- feature_0 <= 40543.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | | |--- feature_0 > 40543.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- feature_1 > 953457.00 | | | | | | | | | |--- feature_0 <= 24219.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 24219.50 | | | | | | | | | | |--- feature_0 <= 41984.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- feature_0 > 41984.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- feature_0 > 44378.50 | | | | | | | |--- feature_0 <= 74381.50 | | | | | | | | |--- feature_0 <= 71768.00 | | | | | | | | | |--- feature_0 <= 64827.50 | | | | | | | | | | |--- feature_1 <= 939160.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- feature_1 > 939160.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 64827.50 | | | | | | | | | | |--- feature_0 <= 68946.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- feature_0 > 68946.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 71768.00 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 74381.50 | | | | | | | | |--- feature_1 <= 959715.00 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 959715.00 | | | | | | | | | |--- feature_1 <= 960980.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 960980.00 | | | | | | | | | | |--- class: STAY | | | | | |--- feature_1 > 971726.00 | | | | | | |--- feature_1 <= 971898.50 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 971898.50 | | | | | | | |--- feature_0 <= 22894.50 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 22894.50 | | | | | | | | |--- class: STAY | | |--- feature_1 > 972711.50 | | | |--- feature_0 <= 85411.50 | | | | |--- feature_0 <= 34256.50 | | | | | |--- feature_0 <= 33698.50 | | | | | | |--- feature_1 <= 985019.00 | | | | | | | |--- feature_1 <= 982416.50 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 982416.50 | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 985019.00 | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 33698.50 | | | | | | |--- class: LEAVE | | | | |--- feature_0 > 34256.50 | | | | | |--- feature_1 <= 998339.00 | | | | | | |--- class: STAY | | | | | |--- feature_1 > 998339.00 | | | | | | |--- feature_1 <= 998920.00 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 998920.00 | | | | | | | |--- class: STAY | | | |--- feature_0 > 85411.50 | | | | |--- feature_0 <= 93270.00 | | | | | |--- feature_1 <= 991880.00 | | | | | | |--- feature_1 <= 985684.50 | | | | | | | |--- feature_0 <= 92023.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 92023.00 | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 985684.50 | | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 991880.00 | | | | | | |--- class: STAY | | | | |--- feature_0 > 93270.00 | | | | | |--- class: STAY | |--- feature_0 > 99826.00 | | |--- feature_1 <= 680879.50 | | | |--- feature_1 <= 664908.50 | | | | |--- feature_0 <= 145073.50 | | | | | |--- feature_0 <= 132494.50 | | | | | | |--- feature_0 <= 128797.50 | | | | | | | |--- feature_0 <= 127959.00 | | | | | | | | |--- feature_1 <= 657127.50 | | | | | | | | | |--- feature_1 <= 603488.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 603488.00 | | | | | | | | | | |--- feature_0 <= 108983.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- feature_0 > 108983.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- feature_1 > 657127.50 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 127959.00 | | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 128797.50 | | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 132494.50 | | | | | | |--- feature_0 <= 134659.50 | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 134659.50 | | | | | | | |--- feature_0 <= 135359.50 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 135359.50 | | | | | | | | |--- feature_1 <= 621022.50 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 621022.50 | | | | | | | | | |--- feature_1 <= 627247.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 627247.00 | | | | | | | | | | |--- feature_1 <= 642405.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_1 > 642405.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | |--- feature_0 > 145073.50 | | | | | |--- feature_1 <= 611945.00 | | | | | | |--- feature_1 <= 609626.00 | | | | | | | |--- feature_0 <= 150541.50 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 150541.50 | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 609626.00 | | | | | | | |--- class: STAY | | | | | |--- feature_1 > 611945.00 | | | | | | |--- feature_0 <= 152955.00 | | | | | | | |--- feature_0 <= 146093.00 | | | | | | | | |--- feature_0 <= 146034.50 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 146034.50 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 146093.00 | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 152955.00 | | | | | | | |--- feature_1 <= 649088.00 | | | | | | | | |--- feature_0 <= 156795.50 | | | | | | | | | |--- feature_0 <= 156715.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 156715.50 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 156795.50 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 649088.00 | | | | | | | | |--- class: STAY | | | |--- feature_1 > 664908.50 | | | | |--- feature_0 <= 113603.00 | | | | | |--- feature_0 <= 111235.00 | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 111235.00 | | | | | | |--- class: STAY | | | | |--- feature_0 > 113603.00 | | | | | |--- class: LEAVE | | |--- feature_1 > 680879.50 | | | |--- feature_1 <= 686139.00 | | | | |--- feature_1 <= 681452.50 | | | | | |--- feature_1 <= 681230.50 | | | | | | |--- class: STAY | | | | | |--- feature_1 > 681230.50 | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 681452.50 | | | | | |--- class: STAY | | | |--- feature_1 > 686139.00 | | | | |--- feature_0 <= 158903.50 | | | | | |--- feature_1 <= 688547.00 | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 688547.00 | | | | | | |--- feature_1 <= 692485.50 | | | | | | | |--- feature_0 <= 113341.00 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 113341.00 | | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 692485.50 | | | | | | | |--- feature_1 <= 725395.00 | | | | | | | | |--- feature_1 <= 713548.50 | | | | | | | | | |--- feature_1 <= 711182.00 | | | | | | | | | | |--- feature_0 <= 101860.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | | |--- feature_0 > 101860.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- feature_1 > 711182.00 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 713548.50 | | | | | | | | | |--- feature_0 <= 110569.00 | | | | | | | | | | |--- feature_0 <= 109029.00 | | | | | | | | | | | |--- class: LEAVE | | | | | | | | | | |--- feature_0 > 109029.00 | | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 110569.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 725395.00 | | | | | | | | |--- feature_0 <= 141221.50 | | | | | | | | | |--- feature_0 <= 139544.00 | | | | | | | | | | |--- feature_0 <= 136258.00 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | | |--- feature_0 > 136258.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- feature_0 > 139544.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 141221.50 | | | | | | | | | |--- feature_0 <= 146853.00 | | | | | | | | | | |--- feature_0 <= 144086.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- feature_0 > 144086.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- feature_0 > 146853.00 | | | | | | | | | | |--- feature_0 <= 149801.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- feature_0 > 149801.00 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | |--- feature_0 > 158903.50 | | | | | |--- feature_1 <= 821752.50 | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 821752.50 | | | | | | |--- feature_1 <= 843422.50 | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 843422.50 | | | | | | | |--- class: LEAVE
What is the accuracy of the money_tree? Use these steps to calculate accuracy.
Is this over fitted?
pred = money_tree.predict(x)
#print(pred)
print("Accuracy:",metrics.accuracy_score(y, pred))
Accuracy: 1.0
Limit the number of levels to 2
money_tree2 = DecisionTreeClassifier(criterion="entropy", max_depth=10)
# Create Decision Tree Classifer
money_tree2 = money_tree2.fit(x,y)
money_tree2_text = tree.export_text(money_tree2)
print(money_tree2_text)
|--- feature_1 <= 600255.50 | |--- feature_1 <= 319103.00 | | |--- feature_1 <= 316869.50 | | | |--- feature_0 <= 147861.50 | | | | |--- feature_0 <= 146324.00 | | | | | |--- feature_0 <= 146155.00 | | | | | | |--- feature_0 <= 146076.50 | | | | | | | |--- feature_1 <= 248232.50 | | | | | | | | |--- feature_1 <= 248021.00 | | | | | | | | | |--- feature_1 <= 247683.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 247683.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 248021.00 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 248232.50 | | | | | | | | |--- feature_1 <= 249311.50 | | | | | | | | | |--- feature_0 <= 79915.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 79915.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 249311.50 | | | | | | | | | |--- feature_0 <= 128136.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 128136.50 | | | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 146076.50 | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 146155.00 | | | | | | |--- class: LEAVE | | | | |--- feature_0 > 146324.00 | | | | | |--- feature_1 <= 230355.00 | | | | | | |--- feature_1 <= 170109.00 | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 170109.00 | | | | | | | |--- feature_0 <= 147051.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 147051.00 | | | | | | | | |--- feature_0 <= 147242.00 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 147242.00 | | | | | | | | | |--- feature_0 <= 147301.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 147301.50 | | | | | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 230355.00 | | | | | | |--- class: STAY | | | |--- feature_0 > 147861.50 | | | | |--- feature_1 <= 257109.50 | | | | | |--- feature_0 <= 158635.00 | | | | | | |--- feature_0 <= 148332.00 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 148332.00 | | | | | | | |--- feature_0 <= 148429.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 148429.00 | | | | | | | | |--- feature_0 <= 158602.50 | | | | | | | | | |--- feature_1 <= 157387.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 157387.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 158602.50 | | | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 158635.00 | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 257109.50 | | | | | |--- feature_1 <= 263053.00 | | | | | | |--- class: STAY | | | | | |--- feature_1 > 263053.00 | | | | | | |--- feature_1 <= 269738.50 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 269738.50 | | | | | | | |--- feature_1 <= 270463.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 270463.00 | | | | | | | | |--- feature_0 <= 154961.00 | | | | | | | | | |--- feature_0 <= 152336.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 152336.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 154961.00 | | | | | | | | | |--- feature_0 <= 156046.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 156046.00 | | | | | | | | | | |--- class: LEAVE | | |--- feature_1 > 316869.50 | | | |--- feature_0 <= 46699.50 | | | | |--- class: STAY | | | |--- feature_0 > 46699.50 | | | | |--- feature_0 <= 50749.00 | | | | | |--- class: LEAVE | | | | |--- feature_0 > 50749.00 | | | | | |--- feature_0 <= 141356.00 | | | | | | |--- feature_1 <= 318490.50 | | | | | | | |--- feature_0 <= 60323.00 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 60323.00 | | | | | | | | |--- feature_0 <= 130833.50 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 130833.50 | | | | | | | | | |--- feature_1 <= 317773.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 317773.50 | | | | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 318490.50 | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 141356.00 | | | | | | |--- class: LEAVE | |--- feature_1 > 319103.00 | | |--- feature_0 <= 20181.00 | | | |--- class: STAY | | |--- feature_0 > 20181.00 | | | |--- feature_0 <= 21461.50 | | | | |--- feature_1 <= 569388.00 | | | | | |--- feature_0 <= 20312.00 | | | | | | |--- feature_0 <= 20255.50 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 20255.50 | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 20312.00 | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 569388.00 | | | | | |--- feature_0 <= 20402.00 | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 20402.00 | | | | | | |--- class: STAY | | | |--- feature_0 > 21461.50 | | | | |--- feature_0 <= 21608.50 | | | | | |--- class: STAY | | | | |--- feature_0 > 21608.50 | | | | | |--- feature_1 <= 326096.50 | | | | | | |--- feature_0 <= 89423.50 | | | | | | | |--- feature_0 <= 31392.50 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 31392.50 | | | | | | | | |--- feature_1 <= 323587.00 | | | | | | | | | |--- feature_1 <= 323077.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 323077.50 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 323587.00 | | | | | | | | | |--- feature_1 <= 325306.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 325306.50 | | | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 89423.50 | | | | | | | |--- feature_1 <= 319821.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 319821.00 | | | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 326096.50 | | | | | | |--- feature_1 <= 334520.00 | | | | | | | |--- feature_0 <= 104510.00 | | | | | | | | |--- feature_0 <= 63982.00 | | | | | | | | | |--- feature_1 <= 327946.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 327946.00 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 63982.00 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 104510.00 | | | | | | | | |--- feature_1 <= 332032.50 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 332032.50 | | | | | | | | | |--- feature_1 <= 332684.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 332684.00 | | | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 334520.00 | | | | | | | |--- feature_1 <= 342372.00 | | | | | | | | |--- feature_1 <= 340083.50 | | | | | | | | | |--- feature_1 <= 340010.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 340010.50 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 340083.50 | | | | | | | | | |--- feature_0 <= 31832.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 31832.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 342372.00 | | | | | | | | |--- feature_0 <= 21932.00 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 21932.00 | | | | | | | | | |--- feature_0 <= 22349.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 22349.50 | | | | | | | | | | |--- class: LEAVE |--- feature_1 > 600255.50 | |--- feature_0 <= 99826.00 | | |--- feature_1 <= 972711.50 | | | |--- feature_1 <= 924480.50 | | | | |--- feature_1 <= 817165.50 | | | | | |--- feature_1 <= 816584.00 | | | | | | |--- feature_0 <= 20664.00 | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 20664.00 | | | | | | | |--- feature_0 <= 20737.00 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 20737.00 | | | | | | | | |--- feature_0 <= 21765.50 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 21765.50 | | | | | | | | | |--- feature_0 <= 21932.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 21932.00 | | | | | | | | | | |--- class: STAY | | | | | |--- feature_1 > 816584.00 | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 817165.50 | | | | | |--- feature_0 <= 20781.50 | | | | | | |--- feature_1 <= 857573.50 | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 857573.50 | | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 20781.50 | | | | | | |--- feature_1 <= 831993.00 | | | | | | | |--- feature_0 <= 93237.50 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 93237.50 | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 831993.00 | | | | | | | |--- feature_0 <= 27323.50 | | | | | | | | |--- feature_1 <= 850244.00 | | | | | | | | | |--- feature_0 <= 22963.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 22963.00 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 850244.00 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 27323.50 | | | | | | | | |--- feature_0 <= 86041.50 | | | | | | | | | |--- feature_1 <= 834707.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 834707.00 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 86041.50 | | | | | | | | | |--- feature_1 <= 876145.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 876145.00 | | | | | | | | | | |--- class: STAY | | | |--- feature_1 > 924480.50 | | | | |--- feature_1 <= 934710.50 | | | | | |--- feature_1 <= 925641.50 | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 925641.50 | | | | | | |--- feature_0 <= 80700.50 | | | | | | | |--- feature_0 <= 57594.50 | | | | | | | | |--- feature_0 <= 52208.00 | | | | | | | | | |--- feature_1 <= 927511.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 927511.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 52208.00 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 57594.50 | | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 80700.50 | | | | | | | |--- feature_1 <= 926780.50 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_1 > 926780.50 | | | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 934710.50 | | | | | |--- feature_0 <= 74381.50 | | | | | | |--- feature_1 <= 943462.00 | | | | | | | |--- feature_0 <= 46470.00 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 46470.00 | | | | | | | | |--- feature_0 <= 55848.00 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 55848.00 | | | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 943462.00 | | | | | | | |--- feature_0 <= 61411.00 | | | | | | | | |--- feature_0 <= 44500.50 | | | | | | | | | |--- feature_0 <= 22140.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 22140.50 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_0 > 44500.50 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 61411.00 | | | | | | | | |--- feature_0 <= 65466.50 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 65466.50 | | | | | | | | | |--- feature_0 <= 71768.00 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 71768.00 | | | | | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 74381.50 | | | | | | |--- feature_1 <= 959715.00 | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 959715.00 | | | | | | | |--- feature_1 <= 960980.00 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 960980.00 | | | | | | | | |--- class: STAY | | |--- feature_1 > 972711.50 | | | |--- feature_1 <= 976372.00 | | | | |--- class: STAY | | | |--- feature_1 > 976372.00 | | | | |--- feature_1 <= 976703.00 | | | | | |--- class: LEAVE | | | | |--- feature_1 > 976703.00 | | | | | |--- feature_1 <= 982699.50 | | | | | | |--- class: STAY | | | | | |--- feature_1 > 982699.50 | | | | | | |--- feature_1 <= 983115.00 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_1 > 983115.00 | | | | | | | |--- feature_1 <= 998523.00 | | | | | | | | |--- feature_1 <= 998339.00 | | | | | | | | | |--- feature_0 <= 85411.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_0 > 85411.50 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 998339.00 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 998523.00 | | | | | | | | |--- class: STAY | |--- feature_0 > 99826.00 | | |--- feature_0 <= 158903.50 | | | |--- feature_1 <= 680879.50 | | | | |--- feature_1 <= 664908.50 | | | | | |--- feature_0 <= 145073.50 | | | | | | |--- feature_0 <= 132494.50 | | | | | | | |--- feature_0 <= 128797.50 | | | | | | | | |--- feature_0 <= 127959.00 | | | | | | | | | |--- feature_1 <= 657127.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 657127.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 127959.00 | | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 128797.50 | | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 132494.50 | | | | | | | |--- feature_0 <= 134659.50 | | | | | | | | |--- class: STAY | | | | | | | |--- feature_0 > 134659.50 | | | | | | | | |--- feature_0 <= 135359.50 | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_0 > 135359.50 | | | | | | | | | |--- feature_1 <= 621022.50 | | | | | | | | | | |--- class: STAY | | | | | | | | | |--- feature_1 > 621022.50 | | | | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 145073.50 | | | | | | |--- feature_1 <= 656568.00 | | | | | | | |--- feature_1 <= 649088.00 | | | | | | | | |--- feature_1 <= 631162.50 | | | | | | | | | |--- feature_0 <= 156795.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 156795.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | |--- feature_1 > 631162.50 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 649088.00 | | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 656568.00 | | | | | | | |--- class: LEAVE | | | | |--- feature_1 > 664908.50 | | | | | |--- feature_0 <= 113603.00 | | | | | | |--- feature_0 <= 111235.00 | | | | | | | |--- class: LEAVE | | | | | | |--- feature_0 > 111235.00 | | | | | | | |--- class: STAY | | | | | |--- feature_0 > 113603.00 | | | | | | |--- class: LEAVE | | | |--- feature_1 > 680879.50 | | | | |--- feature_1 <= 686139.00 | | | | | |--- feature_0 <= 115262.50 | | | | | | |--- feature_0 <= 109257.50 | | | | | | | |--- class: STAY | | | | | | |--- feature_0 > 109257.50 | | | | | | | |--- class: LEAVE | | | | | |--- feature_0 > 115262.50 | | | | | | |--- class: STAY | | | | |--- feature_1 > 686139.00 | | | | | |--- feature_1 <= 688547.00 | | | | | | |--- class: LEAVE | | | | | |--- feature_1 > 688547.00 | | | | | | |--- feature_1 <= 692485.50 | | | | | | | |--- feature_0 <= 113341.00 | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_0 > 113341.00 | | | | | | | | |--- class: STAY | | | | | | |--- feature_1 > 692485.50 | | | | | | | |--- feature_1 <= 725395.00 | | | | | | | | |--- feature_1 <= 719464.50 | | | | | | | | | |--- feature_1 <= 717710.00 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_1 > 717710.00 | | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 719464.50 | | | | | | | | | |--- class: LEAVE | | | | | | | |--- feature_1 > 725395.00 | | | | | | | | |--- feature_1 <= 725692.00 | | | | | | | | | |--- class: STAY | | | | | | | | |--- feature_1 > 725692.00 | | | | | | | | | |--- feature_0 <= 141221.50 | | | | | | | | | | |--- class: LEAVE | | | | | | | | | |--- feature_0 > 141221.50 | | | | | | | | | | |--- class: STAY | | |--- feature_0 > 158903.50 | | | |--- feature_1 <= 821752.50 | | | | |--- class: LEAVE | | | |--- feature_1 > 821752.50 | | | | |--- feature_1 <= 843422.50 | | | | | |--- class: STAY | | | | |--- feature_1 > 843422.50 | | | | | |--- class: LEAVE
Is this accuracy better than making a random guess? (check the distribution above)
pred = money_tree2.predict(x)
print("Accuracy:",metrics.accuracy_score(y, pred))
Accuracy: 0.6657989587505005
from six import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
export_graphviz(money_tree2, out_file=dot_data,
feature_names=x.columns,class_names=['leave','stay'],
filled=True,rounded=True, precision =2)
graph=pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
Use all of the independent attributes. We'll call this the "full tree."
What is the accuracy of the full tree?
# split the datafram into independent (x) and dependent (predicted) attributes (y)
x = df_clean[['income','house','college','overage','leftover','handset_price','over_15mins_calls_per_month','average_call_duration','reported_satisfaction','reported_usage_level','considering_change_of_plan']]
y = df_clean['leave']
full_tree = DecisionTreeClassifier(criterion="entropy", max_depth=1)
# Create Decision Tree Classifer
full_tree = full_tree.fit(x,y)
from six import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
export_graphviz(full_tree, out_file=dot_data,
feature_names=x.columns,class_names=['leave','stay'],
filled=True,rounded=True, precision =2)
graph=pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
pred = full_tree.predict(x)
#print(pred)
print("Accuracy:",metrics.accuracy_score(y, pred))
Accuracy: 0.6215458550260312
Now we will split the dataset into 80% training data and 20% test data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=5)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6936936936936937
from six import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
export_graphviz(train_tree, out_file=dot_data,
feature_names=x.columns,class_names=['leave','stay'],
filled=True,rounded=True, precision =2)
graph=pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
➡️ Assignment Tasks
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=2)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6376376376376376
Using half of the max_depth descreased the accuracy. This is likely because it's grouping more broadly into less buckets and larger groupings.
➡️ Assignment Tasks
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=10)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6826826826826827
Using double of the max_depth descreased the accuracy. This is likely because it's grouping more specifically into more buckets and smaller groupings.
➡️ Assignment Tasks
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=12)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6656656656656657
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=20)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6346346346346347
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=100)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6226226226226226
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=1000)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6246246246246246
train_tree = DecisionTreeClassifier(criterion="entropy", max_depth=11)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.6806806806806807
The higher the max depth is increased, the worse the accuracy. Best results are around 10 - 12 max_depth.
➡️ Assignment Tasks
train_tree = DecisionTreeClassifier(criterion="entropy", max_leaf_nodes=10, max_depth=5)
# Create Decision Tree Classifer
train_tree = train_tree.fit(x_train,y_train)
pred = train_tree.predict(x_test)
print("Accuracy:",metrics.accuracy_score(y_test, pred))
Accuracy: 0.7117117117117117